Business AI Meeting Companion STT

cognitiveclass.ai logo

Introduction

Consider you're attending a business meeting where all conversations are being captured by an advanced AI application. This application not only transcribes the discussions with high accuracy but also provides a concise summary of the meeting, emphasizing the key points and decisions made.

In our project, we'll use OpenAI's Whisper to transform speech into text. Next, we'll use IBM Watson's AI to summarize and find key points. We'll make an app with Hugging Face Gradio as the user interface.

Learning Objectives

After finishing this lab, you will able to:

  • Create a Python script to generate text using a model from the Hugging Face Hub, identify some key parameters that influence the model's output, and have a basic understanding of how to switch between different LLM models.
  • Use OpenAI's Whisper technology to convert lecture recordings into text, accurately.
  • Implement IBM Watson's AI to effectively summarize the transcribed lectures and extract key points.
  • Create an intuitive and user-friendly interface using Hugging Face Gradio, ensuring ease of use for students and educators.
langchain
Generated by DALLE-3

Preparing the environment

Let's start with setting up the environment by creating a Python virtual environment and installing the required libraries, using the following commands in the terminal:

  1. 1
  2. 2
  3. 3
  1. pip3 install virtualenv
  2. virtualenv my_env # create a virtual environment my_env
  3. source my_env/bin/activate # activate my_env

Then, install the required libraries in the environment (this will take time ☕️☕️):

  1. 1
  2. 2
  1. # installing required libraries in my_env
  2. pip install transformers==4.35.2 torch==2.1.1 gradio==4.44.0 langchain==0.0.343 ibm_watson_machine_learning==1.0.335 huggingface-hub==0.19.4

Have a cup of coffee, it will take a few minutes.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  1. ) (
  2. ( ) )
  3. ) ( (
  4. _______)_
  5. .-'---------|
  6. ( C|/\/\/\/\/|
  7. '-./\/\/\/\/|
  8. '_________'
  9. '-------'

We need to install ffmpeg to be able to work with audio files in python.

  1. 1
  1. sudo apt update

Then run:

  1. 1
  1. sudo apt install ffmpeg -y

Whisper from OpenAI is available in github. Whisper's code and model weights are released under the MIT License. See LICENSE for further details.